7 research outputs found
Clustering using Vector Membership: An Extension of the Fuzzy C-Means Algorithm
Clustering is an important facet of explorative data mining and finds
extensive use in several fields. In this paper, we propose an extension of the
classical Fuzzy C-Means clustering algorithm. The proposed algorithm,
abbreviated as VFC, adopts a multi-dimensional membership vector for each data
point instead of the traditional, scalar membership value defined in the
original algorithm. The membership vector for each point is obtained by
considering each feature of that point separately and obtaining individual
membership values for the same. We also propose an algorithm to efficiently
allocate the initial cluster centers close to the actual centers, so as to
facilitate rapid convergence. Further, we propose a scheme to achieve crisp
clustering using the VFC algorithm. The proposed, novel clustering scheme has
been tested on two standard data sets in order to analyze its performance. We
also examine the efficacy of the proposed scheme by analyzing its performance
on image segmentation examples and comparing it with the classical Fuzzy
C-means clustering algorithm.Comment: 6 pages, 8 figures and 1 table (Conference Paper
Signal Processing Grand Challenge 2023 -- e-Prevention: Sleep Behavior as an Indicator of Relapses in Psychotic Patients
This paper presents the approach and results of USC SAIL's submission to the
Signal Processing Grand Challenge 2023 - e-Prevention (Task 2), on detecting
relapses in psychotic patients. Relapse prediction has proven to be
challenging, primarily due to the heterogeneity of symptoms and responses to
treatment between individuals. We address these challenges by investigating the
use of sleep behavior features to estimate relapse days as outliers in an
unsupervised machine learning setting. We extract informative features from
human activity and heart rate data collected in the wild, and evaluate various
combinations of feature types and time resolutions. We found that short-time
sleep behavior features outperformed their awake counterparts and larger time
intervals. Our submission was ranked 3rd in the Task's official leaderboard,
demonstrating the potential of such features as an objective and non-invasive
predictor of psychotic relapses.Comment: 2 pages, 1 table, ICASSP 2023, Grand Challenges Trac
Unlocking Foundation Models for Privacy-Enhancing Speech Understanding: An Early Study on Low Resource Speech Training Leveraging Label-guided Synthetic Speech Content
Automatic Speech Understanding (ASU) leverages the power of deep learning
models for accurate interpretation of human speech, leading to a wide range of
speech applications that enrich the human experience. However, training a
robust ASU model requires the curation of a large number of speech samples,
creating risks for privacy breaches. In this work, we investigate using
foundation models to assist privacy-enhancing speech computing. Unlike
conventional works focusing primarily on data perturbation or distributed
algorithms, our work studies the possibilities of using pre-trained generative
models to synthesize speech content as training data with just label guidance.
We show that zero-shot learning with training label-guided synthetic speech
content remains a challenging task. On the other hand, our results demonstrate
that the model trained with synthetic speech samples provides an effective
initialization point for low-resource ASU training. This result reveals the
potential to enhance privacy by reducing user data collection but using
label-guided synthetic speech content
A dataset for Audio-Visual Sound Event Detection in Movies
Audio event detection is a widely studied audio processing task, with
applications ranging from self-driving cars to healthcare. In-the-wild datasets
such as Audioset have propelled research in this field. However, many efforts
typically involve manual annotation and verification, which is expensive to
perform at scale. Movies depict various real-life and fictional scenarios which
makes them a rich resource for mining a wide-range of audio events. In this
work, we present a dataset of audio events called Subtitle-Aligned Movie Sounds
(SAM-S). We use publicly-available closed-caption transcripts to automatically
mine over 110K audio events from 430 movies. We identify three dimensions to
categorize audio events: sound, source, quality, and present the steps involved
to produce a final taxonomy of 245 sounds. We discuss the choices involved in
generating the taxonomy, and also highlight the human-centered nature of sounds
in our dataset. We establish a baseline performance for audio-only sound
classification of 34.76% mean average precision and show that incorporating
visual information can further improve the performance by about 5%. Data and
code are made available for research at
https://github.com/usc-sail/mica-subtitle-aligned-movie-sound
Does Video Summarization Require Videos? Quantifying the Effectiveness of Language in Video Summarization
Video summarization remains a huge challenge in computer vision due to the
size of the input videos to be summarized. We propose an efficient,
language-only video summarizer that achieves competitive accuracy with high
data efficiency. Using only textual captions obtained via a zero-shot approach,
we train a language transformer model and forego image representations. This
method allows us to perform filtration amongst the representative text vectors
and condense the sequence. With our approach, we gain explainability with
natural language that comes easily for human interpretation and textual
summaries of the videos. An ablation study that focuses on modality and data
compression shows that leveraging text modality only effectively reduces input
data processing while retaining comparable results.Comment: \c{opyright} 2024 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other work